SimCTC: A Simple Contrast Learning Method of Text Clustering (Student Abstract)

نویسندگان

چکیده

This paper presents SimCTC, a simple contrastive learning (CL) framework that greatly advances the state-of-the-art text clustering models. In pre-trained BERT model first maps input sequence to representation space, which is then followed by three different loss function heads: Clustering head, Instance-CL head and Cluster-CL head. Experimental results on multiple benchmark datasets demonstrate SimCTC remarkably outperforms 6 competitive methods with 1%-6% improvement Accuracy (ACC) 1%-4% Normalized Mutual Information (NMI). Moreover, our also show performance can be further improved setting an appropriate number of clusters in cluster-level objective.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Silhouette + attraction: A simple and effective method for text clustering

This article presents Sil-Att, a simple and effective method for text clustering, which is based on two main concepts: the silhouette coefficient and the idea of attraction. The combination of both principles allows us to obtain a general technique that can be used either as a boosting method, which improves results of other clustering algorithms, or as an independent clustering algorithm. The ...

متن کامل

Clustering Student Learning Activity Data

We show a variety of ways to cluster student activity datasets using different clustering and subspace clustering algorithms. Our results suggest that each algorithm has its own strength and weakness, and can be used to find clusters of different properties. 1 Background Introduction Many education datasets are by nature high dimensional. Finding coherent and compact clusters becomes difficult ...

متن کامل

A Supervised Clustering Method for Text Classification

This paper describes a supervised three-tier clustering method for classifying students’ essays of qualitative physics in the Why2-Atlas tutoring system. Our main purpose of categorizing text in our tutoring system is to map the students’ essay statements into principles and misconceptions of physics. A simple `bag-of-words’ representation using a naïve-bayes algorithm to categorize text was un...

متن کامل

A Simple Text-line segmentation Method

Text line segmentation is an important step because inaccurately segmented text lines will cause errors in the recognition stage.. The nature of handwriting makes the process of text line segmentation very challenging. Text characteristics can vary in font, size, orientation, alignment, color, contrast, and background information. These variations turn the process of word detection complex and ...

متن کامل

Learning To Identify Student Preconceptions From Text

Automatic classification of short textual answers by students to questions about topics in physics, computing, etc., is an attractive approach to diagnostic assessment of learning. We present a language for expressing rules that can classify text based on the presence and relative positions of words, lists of synonyms and other abstractions of a single word. We also describe a system, based on ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i11.21635